High Dimensional Data Clustering Using Fast Cluster Based Feature Selection

نویسندگان

  • Karthikeyan
  • Saravanan
  • Vanitha
چکیده

Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based feature selection algorithm (FAST) is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent; the clustering-based strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree (MST) using the Kruskal‟s Algorithm clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical study.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Pragmatic Application of Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data

Using the rapid growth of computational biology and e-commerce applications, high-dimensional data becomes usual. Thus, mining high dimensional data is an urgent problem of great practical importance. Within the high dimensional data the dimensional reduction is a vital factor, to the purpose the clustering based feature subset selection algorithm is proposed in this particular paper. The chara...

متن کامل

Feature Subset Selection In High Dimensional Data By Using Fast Technique

Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness. A fast clustering-based feature selection algorithm, FAST works in two steps. In the first step, features are divided into clusters by using graphtheoretic c...

متن کامل

Algorithm For Identifying Relevant Features Using Fast Clustering

In the high dimensional data set having features selection involves identifying a subset of the most useful features that produce compatible results as the original entire set of features. A fast algorithm may be evaluated from both the ability concerns the time required to find a subset of features and the value is required to the quality of the subset of features. Fast clustering based featur...

متن کامل

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

A Feature selection for the high dimensional data clustering is a difficult problem because the ground truth class labels that can guide the selection are unavailable in clustering. Besides, the data may have a broad number of features and the irrelevant ones can run the clustering. A novel feature weighting scheme is proposed, in which the weight for each feature is a measure of its contributi...

متن کامل

Feature Subset Selection Algorithm for High-Dimensional Data by using FAST Clustering Approach

Feature selection involves the process of identifying the most useful feature's subset which produces compatible results similar to original set of feature. Efficiency and effectiveness are the two measures to evaluate feature selection algorithm. The time to find the cluster concerns to efficiency, while effectiveness is concerned to quality of subset feature. With these criteria, fast cl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014